Journal of Computer and Knowledge Engineering, Vol. 1, No. 2, 2018. 


DOI: 10.22067/cke.v1i2.61798 


Particle Filter based Target Tracking in Wireless Sensor Networks 
using Support Vector Machine 


Ahmad Namazi Nik 


Abstract: Target tracking is estimating the state of moving 
targets using noisy measurements obtained at a single 
observation point or node. Particle filters or sequential 
Monte Carlo methods use a set of weighted state samples, 
called particles, to approximate the posterior probability 
distribution in a Bayesian setup. During the past few years, 
Particle Filters have become very popular because of their 
ability to process observations represented by nonlinear 
state-space models where the noise of the model can be non- 
Gaussian. There are many Particle Filter methods, and 
almost all of them are based on three operations: particle 
propagation, weight computation, and resampling. One of 
the main limitations of the previously proposed schemes is 
that their implementation in a wireless sensor network 
demands prohibitive communication capability since they 
assume that all the sensor observations are available to every 
processing node in the weight update step. In this paper, we 
use a machine learning technique called support vector 
machine to overcome this drawback and improve the energy 
consumption of sensors. Support Vector Machine (SVM) is 
a classifier which attempts to find a hyperplane that divides 
two classes with the largest margin. Given labeled training 
data, SVM outputs an optimal hyperplane which categorizes 
new examples. The training examples that are closest to the 
hyperplane are called support vectors. Using our approach, 
we could compress sensor observations and only support 
vectors will be communicated between neighbor sensors 
which lead to cost reduction in communication. We use 
LIBSVM library in our work and use MATLAB software to 
plot the results and compare the proposed protocol with CPF 
and DPF algorithms. Simulation results show significant 
reduction in the amount of data transmission over the 
network. 
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1. Introduction 

Target tracking is one of the most important applications of 
wireless sensor networks. Examples include security and 
surveillance [1], environmental monitoring [2] and tracking 
tasks [3]. Target tracking is the estimation of the current state 
and prediction of future states of a target based on 
measurements received from a sensor that is observing it. 
The limited on-board resources of the sensor node and the 
limited wireless bandwidth are the major constraints of 
performing target tracking in wireless sensor networks. In 
order to save resources, target tracking should be 
implemented in a distributed way. Distributed computation 
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has found very successful applications in sensor networks, 
particularly when a powerful central unit is not available. 

Before particle filtering methods became popular, the 
Kalman filter was the standard method for solving state 
space models [4]. The Kalman filter can be applied to 
optimally solve a linear Gaussian state space model. When 
linearity or Gaussian conditions do not hold, its variants, i.e. 
the extended Kalman filter and the unscented Kalman filter, 
can be used. However, for highly nonlinear and non- 
Gaussian problems they fail to provide a reasonable estimate. 

Particle filtering techniques offer an alternative method. 
They work online to approximate the marginal distribution 
of the latent process as observations become available. 
Importance sampling is used at each point in time in order to 
approximate the distribution with a set of discrete values, 
known as particles, each with a corresponding weight. There 
are several papers and books which have presented detailed 
reviews of particle filters and their applications [5-12]. 

In this work we tackle the problem of implementing the 
DPF algorithm and make use of support vector machine — a 
well-known machine learning classification method — to 
compress measurements collected by processing nodes and 
thus reducing communication costs. 

The rest of the paper is organized as follows. In Section 2, 
a brief review of prior related works on target tracking is 
presented. In Section 3 we introduce the problem of target 
tracking in the context of Bayesian filtering and describe the 
solution to the nonlinear filtering problem with a centralized 
PF. In Section 4 we provide a formal description of the DPF 
algorithm. Section 5 introduces support vector machines. In 
Section 6 we provide details of the proposed method. 
Simulation and experimental results are presented and 
discussed in Section 7 and, finally, Section 8 is devoted to 
conclusions. 


2. Related Works 

Target tracking has many real life applications such as 
battlefield surveillance, detection of illegal borders crossing, 
gas leakage, fire spread, and wildlife monitoring. 

Various taxonomies of target tracking algorithms have 
been proposed in the literature and there is no standardized 
or predefined classification. Some works have studied 
tracking algorithms according to the security aspect [13] 
while others have considered energy efficiency [14], fault 
tolerance, mobility, accuracy, and so on [15]. 

A comparative study of target tracking with Kalman Filter, 
Extended Kalman Filter and Particle Filter using Received 
Signal Strength measurements has been reported in [16] and 
their simulation results show that PF has superior 
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performance to the KF and EKF in terms of accuracy and 
root mean square error (RMSE). 

The application of PFs in WSNs is challenging due to the 
limited resources of WSNs. Centralized particle filters (CPFs) 
have some problems such as consuming significant energy 
and vulnerability as a single point of failure. Distributed 
particle filters (DPFs) were studied as a response to these 
problems, in particular, to offload the computation from the 
central unit [17]. 

Particle filtering for target tracking in WSNs has already 
attracted some attention, including a body of work in 
distributed methods [18]. Its relation with agent networks has 
also been explored in [19]. 

In [20], a fully decentralized particle filtering algorithm 
for cooperative blind equalization is introduced. The 
technique is proper, in the sense that it does not make any 
approximations in the computation of the importance 
weights of the particles. However, the scheme is applicable 
only when the state signal is discrete, and would be infeasible 
in terms of computation and communication among nodes. 
In [21], the communication load is reduced using 
quantization and parametric approximations of densities. A 
similar parametric approach is applied in [18] to further 
simplify communications. 

The work reported in [22] provides a generalized approach 
for approximating global likelihood through a consensus 
filter. It approximates log-likelihood by a polynomial 
function, and the sensors exchange only the coefficients of 
the polynomial function to compute global likelihood. 

The authors in [23] proposed a distributed particle filtering 
algorithm with the objective of reducing the overhead data 
that is communicated among the sensors. In their algorithm, 
the sensors exchange information to collaboratively compute 
the global likelihood function that encompasses the 
contribution of the measurements towards building the 
global posterior density of the unknown location parameters. 
Each sensor uses its own measurement to compute its local 
likelihood function and approximates it using a Gaussian 
function. The sensors then propagate only the mean and 
covariance of their approximated likelihood functions to 
other sensors, thereby reducing the communication overhead. 
The global likelihood function is computed collaboratively 
from the parameters of the local likelihood functions using 
an average consensus filter or a forward-backward 
propagation information exchange strategy. 

In [24] a distributed particle filter is designated and it is 
shown that the difference in accuracy of their proposed DPF 
and a centralized filter with the same total number of 
particles is less than 2 cm, while the DPF with four 
processing nodes is over four times faster than an equivalent 
centralized version. This equivalently means that the same 
performance can be obtained on less powerful hardware. The 
main limitation of that scheme is that every node performing 
a subset of the computations of the PF should have access to 
all the observations (i.e., all the measurements collected by 
the WSN at the current time step) in order to guarantee that 
the particle weights are proper and, therefore, the resulting 
estimators are consistent. 


3. Nonlinear Filtering in State-Space System 

3-1. Bayesian Filtering 

Consider the Markov state-space random model with 
conditionally independent observations [25, 26] described by 


the triplet: 
P(X), PIM -), POM), t= 1,2)... (1) 

We denote the states and the observations up to time t by 
Xo È {Xo +, Xt} and yor £ {¥o,-+, Yt}, respectively. p(xo) 
is the prior probability density function (pdf) of the state, the 
transition density p(x,|x,;_1) describes the (random) 
dynamics of the process x; and the conditional pdf p(y;|x;) 
describes how the observations are related to the state and it 
is usually referred to as the likelihood of x;. The goal of a 
stochastic filtering algorithm is to recursively estimate the 
posterior distribution p(x;|y1), t 2 1. 

Suppose that the required pdf p(X,_1]y1.4_1) at time t — 1 
is available. The prediction stage obtains the prior pdf of the 
state at time t via: 


pCxtlYi:t-1) = S plx DP (Xe-11V1:t-1) AX ¢-1 (2) 


At time step t, an observation y,becomes available, and it 
may be used to update the prior (update stage) via Bayes’ 
rule: 


PEL) X pOelxe pt lYi:t-1) (3) 


Eqs. (2) and (3) form the basis for the optimal Bayesian 
solution [6]. If the system of Eq. (1) is linear and Gaussian 
then p(X;,|¥1..) is Gaussian and can be obtained exactly using 
the Kalman filter algorithm [27]. If the state space is discrete 
and finite, exact solutions can also be computed [25]. 
However, if any of the pdf's in (1) is non-Gaussian, or the 
system is nonlinear, we have to resort to suboptimal 
algorithms in order to approximate the filter pdf p(x;|y3::). 


3-2. Particle Filtering 

Particle Filters, also known as sequential Monte Carlo 
methods, are simulation based algorithms that yield 
estimates of the state based on a random point-mass (or 
"particle") representation of the probability measure with 
density p(xtly1:) [28-30]. Table 1 shows the standard 
particle filter algorithm. We refer to it as centralized in order 
to make explicit that it requires a central unit that collects all 
the observations together, generates all the particles and 
processes them together. The resampling step randomly 
eliminates samples with low importance weights and 
replicates samples with high importance weights in order to 
avoid the degeneracy of the importance weights over time 
[26, 31]. 


Table 1: The Centralized Particle Filter (CPF) algorithm 


Initialize: At time t = 0 
Form = 1,...,M 


sample a from prior p(Xo) 


Recursive step: fort > 0 
Form = 1,...,M 


(m) 


dax (m) _ fm) p 


~ p(x, |x”) and set Xg = Xi) Xo:t1 
compute importance weights weet = p(yelx0”) 


Normalize weights w®™ = wi" / wea wl di 


M 
Resample the weighted sample E, wi} to obtain 


M 
an unweighted sample ENGA 
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4. Distributed Particle Filtering 

In this paper, we implement a distributed particle filter with 
nodes that can operate as processing elements (PEs) on a 
wireless sensor network. Each PE is a low-powered device 
that has to perform sensing, computation and radio 
communication tasks while running on batteries. A common 
assumption in other proposed schemes is that all 
observations can be readily made available to all PEs in the 
system [24, 32-33]. Such capacity cannot be taken for 
granted in a WSN, where the observations are collected 
locally by the nodes and communications are necessarily 
constrained because of energy consumption. This issue will 
be addressed in subsequent sections. 

Assume we have N processing nodes in the network and 
each is capable of running a separate PF with K particles (we 
ignore any non-processing nodes for now since they do not 
run particle filters). The total number of particles distributed 
over the network is M=NK. In particular, after the 
completion of a full recursive step of the distributed PF at 
time t-l, the n-th PE should’ hold the set 


a > wi 0, WE he 1...’ Where yok ) is the k-th particle 


at the n-th PE, wep * is the corresponding non-normalized 


importance weight, and wi)” is the non-normalized 
aggregated weight of PE n. 

Each PF runs locally on a node involves the usual steps of 
drawing samples, computing weights and resampling. The 
generation of new particles, the update of the importance 
weights and the resampling step are taken strictly locally, 
without interaction between different nodes. To be specific, 
assume that the transition pdf of model (1) is used as an 
importance function and that the observation vector y, is 
available at every node. Then, at the n-th PE, and fork = 
1,...,K, x" is drawn from the pdf p(x” [xm 9), and the 
corresponding nonnormalized weight is computed as 


(nk)* _ 0k)» (n,k) 
we” = wei P(yelxy” ). 


Hence, the information stored by the n-th node at this point 


becomes {xf (019, woot} 


* kK) * 

w® = vies wi i : 
Next, a resampling step is taken locally by each PE. 
Assuming a multinomial resampling algorithm, we assign, 


for k=1,.., a) ond 


g and the aggregated weight is 


K, x” = xe? with probability w, 
w . 
(nj) _ xa , j=l 
the locally normalized importance weights. After resampling, 
the particles at the n-th PE are equally weighted. 
In the estimation step, we obtain local estimates of target 
position at any node as: 


je{1, ..., K}, where wẹ . K, are 


K were ,k) (n, k) 

(5) 

where woe / wo, k=1,..,K are the 
locally normalized importance weights. 

Global estimates can be easily computed by a linear 
combination of local estimates. In order to obtain a global 
estimate of target position, each node n in the network should 
transmit its local estimate 2ř and its aggregated weight 


Rf = E(xelyit) = S xep ely) dx, = 


A W = 


wi to a prescribed node (working as a fusion center) 
where global estimates can be computed as: 


15 

Re se =y Wg (6) 
m) _ yO /pN Os  ; 

where W” =W,” AER Ww: is the globally 


normalized aggregated weight of the n-th node. 


5. Support Vector Machine 

Support vector machines discriminate two classes by fitting 
an optimal linear separating hyperplane to the training 
samples of two classes in a multidimensional feature space. 
The optimization problem being solved aims to maximize 
the margins between the optimal linear separating 
hyperplane and the closest training samples which are called 
support vectors (Figure 1). In a linearly non-separable case, 
the input data are mapped into a high-dimensional space in 
which the new distribution of the samples enables the fitting 
of a linear hyperplane [34]. 


=1} y 


*2/Ilwll 


\ < 

\ margin 
w \ \ 
`Y \ 


Fig 1. An example of classification of two classes by SVM. The 
support vectors are filled. 


Assume some training data S which are a set of n points 
of the form: 


s= {an wm ERY y; € {+1,-1}} i=l n (7) 


where Rf indicates the class to which point x; belongs 
and each x; is a d-dimensional real vector. The goal of SVM 
is to define a hyperplane which divides S, such that all the 
points with the same label are on the same side of the 
hyperplane while maximizing the distance between the two 
classes +1, -1 and the hyperplane. The boundary can be 
expressed as w.x + b = 0, where w is the normal vector to 


determines the 


IlwIl 
perpendicular distance from the hyperplane to the origin 


along the normal vector w and ||w]| is the Euclidean norm of 
w. The data points nearest to the boundary are used to define 
the margins between the two classes and are known as 
support vectors. At the margins, where the support vectors 
are located, the equations for classes +1 and -1, respectively, 
are: 


the hyperplane. The parameter 


w.x+b=41, w.x+b=-1 (7) 
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and the following decision function can be used to classify 
any data point in either class +1 or -1: 


f (x) = sign(w.x + b) (8) 


The margin between the two classes is measured 


perpendicular to the hyperplane is —, so we want to 


lwll ? 
minimize ||w]|. In a linearly separable case, the support 
vector machine looks for the separating hyperplane with the 
largest margin. Suppose that all the training data satisfy these 
constraints: 


w.x; + b> +1 Vx, withy,; =+1 (9) 

w.x; +b <-—1 Vx, withy; =-—1 (10) 
These can be combined into one inequality: 

yi(w.xi +b) 21 i=1,2,..,N (11) 


where N is the number of training sets. According to [28] 
it is worth to use Lagrangian formulation of the problem. 
Thus, introducing Lagrange multipliers a; > 0, i= 
1,2 ,..., N, one for each of the constraints in Eq. (9), we get 
the following Lagrangian: 


L(w, b,a) = 5 llwil? — EX ayyi(w. x; + b) + Ekia; (12) 


We must now minimize Eq. (10) with respect to w and b, 
and maximize it with respect to aj. Thus: 


a a 

5, L(w, ba) = 0, zL ba) =0 (13) 
which leads to: 

w = Yhiawx, Dia iyi = 0 (14) 


Substituting Eq. (12) into Eq. (10) yields the dual 
quadratic optimization problem: 


Maximize 

1 
Lp = Yih ai — sbie Aj AjViVjXj-X; (15) 
Subject to 
a,20, i=1,2,...,N, (16) 
Dini tii = 0 (17) 


On substitution of Eq. (12) into the decision function (6) 
we obtain an expression which can be evaluated in terms of 
dot products between the pattern to be classified and the 
Support Vectors: 

f (x) = siga Èi ayi (xix) + b) (18) 
The dot product can therefore be replaced with a nonlinear 


kernel function, thereby performing large margin separation 
in the feature-space of the kernel. 


6. Using Support Vector Machine with Distributed 
Particle Filter 

We use LIBSVM [35] in our work. LIBSVM is a library for 
Support Vector Machines and has gained wide popularity in 
machine learning and many other areas [36]. 

The Web address of the package is at 
http://www.csie.ntu.edu.tw/~cjlin/libsvm. Also, we use the 
MATLAB software to plot the results. 

A classification task usually involves separating data into 
training and testing sets. Each instance in the training set 
contains one “target value” (i.e. the class labels) and several 
“attributes” (i.e. the features or observed variables). The goal 
of SVM is to produce a model (based on the training data) 
which predicts the target values of the test data given only 
the test data attributes. Our idea is to make use of support 
vector machine as a data classification technique in our work 
to reduce communications among the nodes. 

As we mentioned in section 4 in the weight update step we 
assume that the observation vector y, is available at every 
node which involves communications among the nodes. We 
use SVM to reduce these communications. SVMs only 
consider points near the margin (support vectors) instead of 
whole data points. According to our assumption, the 
observation coming from sensor j at time t, denoted yj t, is 
modeled as a binary observation. Then our SVM has two 
classes. Each sensor has two attributes which are equal to the 
coordinates of its position. 

Scaling before applying SVM is very important. The main 
advantage of scaling is to avoid attributes in greater numeric 
ranges dominating those in smaller numeric ranges. Another 
advantage is to avoid numerical difficulties during the 
calculation. Because kernel values usually depend on the 
inner products of feature vectors, e.g. the linear kernel and 
the polynomial kernel, large attribute values might cause 
numerical problems. In [37] it is recommended to linearly 
scale each attribute to the range [-1, +1] or [0, 1]. We have 
to use the same method to scale both training and testing data. 
For example, suppose that we scaled the first attribute of 
training data from [-10, +10] to [-1, +1]. If the first attribute 
of testing data lies in the range [-11; +8], we must scale the 
testing data to [-1.1, +0.8]. There are four basic kernel 
functions in SVM, including linear, polynomial, radial basis 
function (RBF) and sigmoid. In our work we have used RBF 
kernel in the training step since it has fewer numerical 
difficulties and has better performance in nonlinear cases. 

When the training is done, support vectors are generated. 
Once the support vectors are determined, the rest of the 
feature set can be discarded, since the support vectors contain 
all the necessary information for the classifier. We propagate 
observations corresponding to these support vectors (y+) 
rather than the whole y, in the network. Then, in the weight 
update step of our distributed particle filter, every processing 
element can obtain observations of other sensors by running 
the final step of the SVM, namely prediction. On the other 
hand, in the prediction step of SVM, we obtain observation 
vector y; from vector y,. Table 2 summarizes the DPF 
algorithm investigated in this paper. 
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Table 2. Distributed Particle Filter (DPF) algorithm 


Initialize: At time t = 0, forn = 1,...... N 
Draw xO) for k = 1, ..., K, from prior p(xo) 


Assign woe = = for all k, set woe =1 


K 
Build the set e, we", wo} N 


(n,k) | (n,k)* 


Recursive step: At time t > 0, start from the set Clas Wei > we i 


Sampling: Draw xo) from p(x, |xee?), for k=1,..,K 


(n,k)* 


Weight update: w, = wiry Ol xr) 


Estimation: compute the desired output, such as the expected value 


K 
Resampling: to obtain the set f0, we, w p-p Where w; 


(1,k)* 


K 
ae Then, forn = 1,... 


,N 


=W"/K fork =1,..,K 
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7. Simulation and Experimental Results 

The goal of our work is to implement a DPF for target 
tracking in a wireless sensor network and use SVM to 
compress measurements collected by these sensors. Our 
experimental scenario is shown in Figure 2. It is a room with 
10 nodes (which are equipped with a light sensor) enclosing 
an area of 4x6 m? with a single source of natural light (a 
window). Modeling environment specifications and 
translating the disturbances caused by the target in the sensor 
readings into distance measurements are very complex. Then, 
instead we emphasize on obtaining binary observations: | if 
the target is in the detection zone and 0 otherwise. 


Fig. 2 Tracking scenario of 4x6 m?. The thick line is the light 
source. There are 10 nodes equipped with light sensors around the 
edges, indicated by squares. The entry to the scenario lies at the 
bottom-right corner. 


Table 3 displays values of the relevant simulation and 
algorithm parameters. The number of processing elements 
(N) is 4 in our experiments and we use N=1 as the equivalent 
to a centralized particle filter. Changing N affects other 
variables, such as the number of sensing-only elements (J-N) 
and the number of particles per PE (K=M/N). It does not 
matter which of the nodes are PEs and which are SEs, since 
we assume a fully connected network. Each node (either PE 
or SE) produces one binary observation every T, second. 

Figure 3 displays the empirical distribution of errors, and 
the average error, for 100 simulated paths. Figure 4 plots two 
selection of these paths along with the path estimated by our 
SVM-based DPF. The dissensions between true and 
estimated position tend to happen when the target moves 
between detection zones. Since the observations are binary 
and zone-based, rather than distance-based, there are gaps 
around the edges (see for example the final points in Figure 
4). Accuracy also tends to be higher nearer the light source 
where more detection zones overlap. 


Table 3. Simulation and algorithm parameters 


Variable Symbol Value (unit) 
Number of PEs N 4 

Number of nodes J 10 
Number of SEs J-N 
Total number of particles M 100 
Number of particles/PE K M/N 

Number of timesteps T 20 (s) 
Sampling period T; 1 (s) 
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Fig 3. Histogram of position error in meters for both the centralized 
(up) and our distributed (down) versions of the particle filter over 
100 simulated trajectories. 


Fig 4. The simulated (black) path for two simulations, and the 
corresponding SVM-based DPF-estimated path (red); over T=20 
time steps. 


Figure 5 displays the amount of saving in the volume of 
propagating information for updating particle weights, using 
the proposed method, for 100 simulated paths. The 
horizontal axis shows the simulation run and the vertical axis 
shows the amount of propagating observations (in percent) 
on the network compared to the case when SVM is not used. 
The results show that using the proposed scheme, 
only %51.9 of sensor observations are propagated on the 
network, compared to the work done in [24], that leads to 
saving energy consumption of sensors. 


A 
| À | A 
60 4 | ININ j l 4 || | 


Amount of propagating observations (percent) 
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Fig 5. The amount of saving in the volume of propagating 
information for updating particle weights, using our proposed 
method, for 100 simulated paths. 
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8. Conclusion 

In this paper, we have described the implementation of a 
distributed particle filter for target tracking in a wireless 
sensor network. One of the main limitations of similar works 
is the need to make all sensor observations available to every 
processing node. To overcome this limitation, we have used 
support vector machine to compress sensor observations. 
Simulation results show that the difference in accuracy of the 
proposed scheme and centralized particle filter and also 
distributed particle filter are insignificant, whereas by 
combining SVM with DPF we have reduced 
communications among the nodes around %48. Since SVMs 
only consider points near the margin (support vectors) 
instead of whole data points, they are suitable for data 
compression. SVMs can produce accurate and robust 
classification results on a sound theoretical basis, even when 
input data are non-monotone and non-linearly separable. The 
biggest limitation of the support vector approach lies in 
choice of the kernel function. In our work, we have used RBF 
kernel in the training step since it has fewer numerical 
difficulties and has better performance in nonlinear cases. 
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